Combination of diverse subword units in spoken term detection
نویسندگان
چکیده
This paper focuses on the following two points: First, we try to clarify the effect of combination systems from two aspects, accuracy and heterogeneity. And then we evaluate our unique subword unit, called Sub-Phonetic Segment (SPS) to maximize performance improvement by combination. Combination systems usually yield higher performance than any individual system. When the systems being combined are individually accurate but also mutually heterogeneous, the improvement by combination can be maximized. From this consideration, we estimate heterogeneity by correlation of false alarm errors of combined systems and confirm that lower correlation of two systems yields the better performance improvement by combination. Comparative tests of several combination approaches are carried out on subword-based spoken term detection. Since subword-based systems use constrained linguistic knowledge, it is fairly straightforward to verify the heterogeneity of combined systems. Experimental results show that the most significant improvements can be achieved by combination of two different subword units, triphone and SPS, which are highly heterogeneous subword units with low correlation of false alarm detections.
منابع مشابه
An STD system for OOV query terms using various subword units
We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms using various subword units, such as monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR is performed for all spoken documents and subword recognition results are generated using subword acoustic models and subword lang...
متن کاملAn STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units
We have been proposing a Spoken Term Detection (STD) method for Out-Of-Vocabulary (OOV) query terms integrating various subword recognition results using monophone, triphone, demiphone, one third phone, and Sub-phonetic segment (SPS) models. In the proposed method, subword-based ASR (Automatic Speech Recognition) is performed for all spoken documents and subword recognition results are generate...
متن کاملHybrid word-subword decoding for spoken term detection
This paper deals with a hybrid word-subword recognition system for spoken term detection. The decoding is driven by a hybrid recognition network and the decoder directly produces hybrid word-subword lattices. One phone and two multigram models were tested to represent sub-word units. The systems were evaluated in terms of spoken term detection accuracy and the size of index. We concluded that t...
متن کاملGenerating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term Detection
This paper proposes a sequence-to-frame dynamic time warping (DTW) combination approach to improve out-ofvocabulary (OOV) spoken term detection (STD) performance gain. The goal of this paper is twofold: first, we propose a method that directly adopts the posterior probability of deep neural network (DNN) and Gaussian mixture model (GMM) as the similarity distance for sequence-to-frame DTW. Seco...
متن کاملMultilayer subword units for open-vocabulary spoken document retrieval
This paper describes the application of subword units in an effort of improving open-vocabulary spoken document retrieval performance in the case of highly corrupted recognition output. This paper presents the developed open-vocabulary spoken document retrieval system including the newly proposed subphonetic segment unit and combining multilayer subword units. Our experiments on Japanese spoken...
متن کامل